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Abstract I discuss the effects of measurement error on regression and density es- 
timation. I review the statistical methods that have been developed to correct for 
measurement error that are most popular in astronomical data analysis, discussing 
their advantages and disadvantages. I describe functional models for accounting for 
measurement error in regression, with emphasis on the methods of moments ap- 
proach and the modified loss function approach. I then describe structural models 
for accounting for measurement error in regression and density estimation, with em- 
phasis on maximum-likelihood and Bayesian methods. As an example of a Bayesian 
application, I analyze an astronomical data set subject to large measurement errors 
and a non-linear dependence between the response and covariate. I conclude with 
some directions for future research. 



1 Introduction 

Measurement error is ubiquitous in astronomy. Astronomical data consists of pas- 
sive observations of objects, whereby astronomers are able to directly measure the 
flux of an object as a function of wavelength, its location on the sky, and the time 
of the observation. Because the number of photons detected from an astronomical 
object follows a Poisson process, this makes the measurement of a source's inten- 
sity intrinsically subject to measurement error, even if none is introduced from the 
detector. Therefore, the very nature of astronomical data makes measurement er- 
ror unavoidable. Moreoever, quantities that are derived from an object's observed 
emission, either by fitting a model to the spectral energy distribution (SED) or by 
employing scaling relationships, are also 'measured' (derived) with error. Examples 
include mass, metallicity, and distance. Often the measurement error on the derived 
quantities is significant. This is unfortunate as inference on the derived quantities is 
often the goal of astronomical data analysis. Therefore, there has been considerable 
interest in how to perform statistical inference in the presence of measurement error 
Measurement error is a problem that affects, at various levels, all scientific re- 
search. Because of this, numerous methods for handling measurement errors have 
been developed (Fuller 1987, Cheng & Van Ness 1999, and CarroU et al. 2006 are 
good references). In this contribution, I will present a survey of methods for han- 
dling measurement error that have been developed and used in astronomical data 
analysis. Because astronomical measurement errors are, in general, heteroskedas- 
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tic (having different variances), I will limit my discussion to methods developed 
for heteroskedasticity. I will focus on situations where a deterministic relationship 
is not assumed between the variables, but where all variables of interest are ran- 
dom and are measured with error Because of this, I will ignore situations where 
the measurement error is the only source of randomness in one's data. An exam- 
ple of this type of situation is fitting a model to an observed spectrum, where the 
measurement error is the only source of randomness; i.e., in the absence of mea- 
surement error a deterministic relationship is assumed between, say, flux density 
and wavelength. Methods for handling measurement error in this case are relatively 
well-established, and typically one simply minimizes the usual statistic (e.g., 
Bevington 2003). However, it is worth pointing out that many complications may 
still exist, and more sophisticated methods may be needed, especially when dealing 
with low-count X-ray and 7-ray data (e.g., van Dyk et al. 2001) or when incorporat- 
ing calibration unceratinties (Lee et al. 2011). Instead, I will focus on methods for 
analyzing data from astronomical samples, where the variables are a random sam- 
ple from an underlying distribution. Within the context of regression, this implies 
that intrinsic scatter (referred to as equation error in the statistics literature) exists 
in the relationship among the variables, and thus a deterministic relationship is not 
assumed between the variables even without the presence of measurement error 

Most of the techniques I will discuss focus on accounting for measurement er- 
ror in regression. The goal of regression is often to understand how one variable 
changes with another. For example, how does the mass of a black hole change as a 
function of the stellar velocity dispersion of the host galaxy's bulge? Typically one 
simply estimates how the average value and dispersion of one variable depends on 
another. Measurement error statistical models are typically divided into two types: 
'functional' and 'structural' models. In functional modeling, one assumes that the 
unknown true values of the variables are fixed, whereas in structural modeling the 
unknown true values of the variables have their own intrinsic distribution. As a re- 
sult, in structural modeling one must parameterically model the distribution of the 
true values of the variables, whereas in functional modeling one does not. Density 
estimation is another important technique in astronomical data analysis, being the 
foundation for luminosity and mass function estimation. The methods I will discuss 
for handling measurement error in structural models are also applicable to density 
estimation, as in this case regression and density estimation are based on the same 
formalism. When discussing regression methods, I will refer to the 'dependent' vari- 
able as the response, and the 'independent variables' as the covariates. 



2 Notation and Error Model 

Throughout this work I will denote the measured response for the ;* data point as 
yi, and the measured covariate for the /* data point as x,-. I will denote the true 
values as T], and respectively. If there are /;> > 1 covariates, then I will use the 
vectors Xj and ^j. I will use y to denote the set of values of y, for each of the « data 
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points, y = bi , • • • ,yn]- To denote the set of I will use x = jcj , . . . ,x„ if there is 
one covariate, and the nx p matrix X — [xi, . . . ,Xn] if there are multiple covariates. I 
assume the classical additive error models throughout this review, unless otherwise 
specified: 



The function f{£, , 6) describes how the mean value of rj depends on ^ as a function 
of the parameters, 6. For example, for linear regression /{£,, 0) = a + ^ with 
9 = (o:, j3) denoting the slopes and intercept. The terms e,-, £x.i, and e,,, are random 
variables denoting the intrinsic scatter in 7] at fixed ^ (i.e., the equation error), the 
measurement error in Xj, and the measurement error in y,, respectively. The random 
variables Ex.i, and are assumed to have zero mean and variances Var{Ei) = 
(J^ ,Var{eyj) = Gy-, and Var{ex,i) = Exj. As is typical in astronomy, the parameter 
<7^ is assumed to be unknown and a free parameter in the model, while the variances 
in the measurement errors, a^f, and Ex.i, are assumed known. The measurement 
errors are assumed to be independent of e,. In addition, for simplicity I also assume 
that the measurement errors in and Xj are independent, unless otherwise specified. 
However, this is not always true, and many methods are able to handle correlated 
measurement errors, see the references for individual techniques for further details. 

Following Gelman et al. (2004), I will also typically use the notation /?(■) to de- 
note the probability density of the argument. For example, p{x) denotes the marginal 
probability density ofx, p{y\x) denotes the conditional probability density of y given 
X, and p{y,x) denotes thejoint probability density ofy andjc. It should be understood 
that /?(•) will not always have the same functional form, and that this must be in- 
ferred from context, i.e., it is not necessarily true that p{x) = p{y) even if x — y. 
When this may be confusing, I use different symbols to denote different probability 
densities. 



3 Effects of Measurement Error 

Measurement error has the effect of blurring and broadening the distribution of 
quantities, similar to the blurring of astronomical images by a point spread function. 
This makes statistical inference based on the measured values biased, and smears out 
any trends in the data. The distribution of the measured quantities is obtained as 
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Under the additive error model of § |2l Equation (|4]i simplifies to 
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Fig. 1 Illustration of the effect of measurement error on regression and density estimation, using 
a simulated sample. The true distribution of the response and covariate (left), compared with the 
measured distribution (center). The error bars in the center plot denote the standard deviation of 
the Gaussian measurement errors. The measurement errors have effectively washed out any visual 
evidence for a tight non-linear relationship between the response and covariate. The right plot 
shows the distribution of the tme and measured values of the covariate. The measurement errors 
have washed out any evidence for bimodality in the distribution, and significantly broadened it. 



p{y,x)= j f(y-n) j g{x-^)p{n,^)d^cm, (5) 

where /(■) and g{-) denote the probability distributions of the measurement errors 
Ey and e^, respectively. Equation ^ shows that under additive measurement error, 
the observed distribution of a set of quantities is the convolution of the intrinsic 
distribution with the measurement error distribution. Convolution has the effect of 
broadening distributions, which biases density estimation and masks trends. 

Some of the effects of measurement error are illustrated in Figure [T] Here, I 
simulated a sample of covariates from a bimodal distribution, and simulated the 
response assuming a nonlinear relationship between r\ and ^ . I then added large 
measurement error to both Tj and ^ . As can be seen, measurement error has blurred 
out many of the features in the data set, and broadened the distributions. 

To further see how measurement error biases statistical inference for regression, 
consider the additive error model for linear regression, assuming one covariate. 
In addition, for simplicity assume that the measurement errors are homoskedastic 
(having the same variance) for both the response and covariate. If one were to ig- 
nore measurement error and proceed through the usual ordinary least-squares (OLS) 
analysis, then one would obtain the following estimates for the slope, variance in the 
intrinsic scatter, and uncertainty in the estimated slope (assume the intercept, a, is 
known); 

3 ^ Cov{x,y) ^ Cov((^,T7) 

'^^'■^ Var{x) Var{^) + ai ^' 
&dLS = Var{y -a- poLS^) = (jS^ - P^^^s)Var{^ ) + P^LS^^^ + + cj^ (7) 
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(8) 



Var{x) Var{^ ) + a} 



■2' 



where j3 and are the true values of the slope and variance in intrinsic scatter. 
From Equations ©-([Sl) we can deduce the following: 

• Equation Q shows that measurement error in the covariate attenuates the regres- 
sion slope, biasing it toward zero. Therefore, trends between the response and the 
covariate will appear weaker than they really are. If the measurement error in the 
covariate is negligible, then there is no bias in the slope even if the measurement 
errors in the response are large. 

• Equation ^ shows that measurement error in both the response and covariate 
bias the estimate of upward. Therefore, the variance in the response about the 
regression line will appear larger than it really is. 

• Equation (O show that measurement error in the covariate causes one to under- 
estimate the error in the estimated slope. Thus, if the covariate is significantly 
contaminated by measurement error, then one would incorrectly conclude that 
the slope is precisely estimated to be « 0, and therefore conclude that there is no 
relationship between the response and covariate! 

Clearly measurement error can have a significant effect on one's data analysis, and 
ignoring it can lead to erroneous conclusions. Luckily, a number of statistical meth- 
ods have been developed for handling measurement errors. 



4 Functional Methods for Accounting for Measurement Error in 
Regression 

A variety of functional models have been proposed for handling measurement er- 
rors in regression, and here I summarize the methods that are commonly used in the 
astronomical literature. Since heteroskedastic measurement errors are the norm in 
astronomy, I only discuss methods that allow the variances in the measurement error 
to vary among the observations. Moreover, as discussed earlier, I focus on methods 
that incorporate intrinsic scatter in the relationship between the response and covari- 
ate. The reader is referred to Carroll et al. (2006) for a more thorough and general 
discussion of methods developed for handling measurement error 



4.1 Method of Moments Approach for Linear Regression 

In linear regression the least-squares estimates of the intercept, slope, and intrin- 
sic dispersion are obtained from the moments of the data. In the previous section I 
showed that the moments of the observed data are biased estimates of the moments 
of the intrinsic distribution when the data are measured with error Therefore a sim- 



6 



Brandon C. Kelly 



pie method of accounting for measurement error in linear regression is to estimate 
the moments of the true values of the data, and then use these estimated moments 
to estimate the regression parameters. This is the idea behind the method of mo- 
ments (MM) estimators, where the moments of the observed data are 'debiased' by 
removing the contribution from the measurement errors. 

Akritas & Bershady (1996) describe a methods of moments approach for lin- 
ear regression that handles heteroskedastic measurement error in both the response 
and covariate, intrinsic scatter, and correlation between the response and covari- 
ate measurement error. Akritas & Bershady used their method to characterize the 
color-luminosity and Tully-Fisher relationships for galaxies. Their estimators, as is 
typical for the method of moments, assume the additive error model of §|2]with the 
mean value of 77 depending linearly on ^ : /(^ ,9) = a + . They do not assume 
a particular distribution for the measurement errors, the covariate, or the intrinsic 
scatter. However, their approach does assume that the variance in the measurement 
errors and correlation between the measurement errors are known. They call their 
estimator the BCES estimator, for bivariate correlated errors and intrinsic scatter. 

Denote the covariance between the measurement errors in the response and co- 
variate as Co\'{ey,i,exj) = C7yv,,-. Also, denote the sample average for x as X, the 
sample average for y as Y, the sample variance for x as Vx, the sample variance for 
y as Vy, and the sample covariance between x and y as Then, the methods of 
moments estimators are 

a _ Vxy - 

Pmm — — r,- (y) 

ccmm^Y-PmmX (10) 

where Gyx = L"=i '^yx.i/n and — Y!l=\ <^xih^- Akritas & Bershady (1996) show 
that the MM estimators are asymptotically unbiased, that the sampling distribution 
of the MM estimators is asymptotically normal, and describe how to estimate the 
asymptotic covariance matrix of OLmm and jS^M- Patriota et al. (2009) derive the 
asymptotic covariance matrix of the MM estimators under the additional assump- 
tion that the measurement errors are normally distributed, creating more powerful 
hypothesis testing when this is true. In addition, Cheng &. Riu (2006) give the MM 
estimator for the variance in the intrinsic scatter: 

^llM = yy-^MM{yxy-Oyx)-dy, (11) 

where of is the sample average of a^-. 

The main advantage of the MM estimators are that they do not make any assump- 
tions about the distribution of the measurements errors, about the distribution of the 
covariate, nor about the distribution of the intrinsic scatter This is attractive, is it 
makes the MM estimators robust. One of the disadvantages of the MM estimators 
is that they are not as precise as some other methods, such as structural models, 
when the distributions of e,, ey,e, and ^ are known, or at least when they can be 
accurately modeled parameterically, as the MM estimators do not impose prior as- 



Measurement En'or Models in Astronomy 



7 



sumptions about the distributions. Another disadvantage is that the MM estimators 
tend to be highly variable when the sample size is small, and/or the measurement 
errors are large. This is on account of the term — o:: in the denominator of the 
equation for Pmm- When the sample size is small, then is more variable, and it is 
possible that Vx ^ O^- This is also possible when measurement errors are large, as 
the variance in x becomes dominated by the measurement errors. When this occurs, 
the estimate for the slope can become very large, or change sign. Similarly, if the 
measurement errors in y are large, then the MM estimator for the intrinsic disper- 
sion can become negative, which is impossible. Therefore, despite the robustness 
of the MM estimators, more stable estimators should be used when the sample size 
is small, or when the measurement errors make up a significant component to the 
variance in the data. 



4.2 Modified Loss Function Approach 

Modified loss function methods modify the figure of merit function (i.e., the 'loss' 
function), to incorporate measurement error The weighted squared error loss func- 
tion is the most common loss function used in astronomy. A weighted least squares 
(WLS) estimator for linear regression was proposed by Sprent (1966) to minimize 
the following loss function for the special case of no intrinsic scatter; 

The weights in Equation ( fT2] i reflect the contribution of the measurement errors to 
the squared error. Here I have used the notation 2wLs(o!,j8) instead of the more 
commonly used to emphasize the fact that Equation (fTSl i is a loss (or figure of 
merit) function, and will not necessarily follow a distribution even if the errors 
are Gaussian (although one can still use Equation (fTZt regardless of the distribution 
of the measurement errors). Note that this implies that one cannot derive uncertain- 
ties in the parameters by looking for regions of constant AQwls{(^tI^)- with the 
method of moments estimators, the WLS estimators do not make any assumptions 
about the distribution of the measurement errors, covariate, or intrinsic scatter. 

The loss function defined by Equation ( fT2b assumes that there is no intrinsic 
scatter in the relationship between the response and covariate. How then to modify 
Equation ( fT2] i to include the intrinsic scatter? Motivated by their work on character- 
izing the Mb//-(7* relationship, Tremaine et al. (2002) suggested using the following 
modified WLS loss function: 
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While the addition of to the denominator of Equation ( fT3] l is intuitive, as it 
reweights the loss function to incorporate the intrinsic scatter, the unknown value 
of (7^ creates difficulties for the WLS estimator based on 2wL5(o!,j3,(7^). As dis- 
cussed in Kelly (2007), Equation (flJl l can only be minimized with respect to a and 
/3 at fixed a^, as the minimum of Equation ( fT3] l occurs at — >^ oo for any value 
of a and )3 . Clearly, one cannot estimate the regression parameters by minimizing 
2wLs(o:, j3, (7^). Instead, the most common approach (as suggested by Tremaine et 
al. (2002)) is to initially use (7^ = 0, and then find the values of a and j3 which min- 
imize Equation ( fT2] i. Then, using these best-fit values for a and j3, C7^ is estimated 
by finding the value such that QwLs{oc,li,(J^)/{n — 2) = 1. Unfortunately, the prop- 
erties of the WLS estimator based on this procedure, such as its bias and asymptotic 
distribution, are unknown. Kelly (2007) performed simulations to study the behav- 
ior of the WLS estimator based on Equation (fTJI i when the data are contaminated 
by large measurement error, and compared with the MM estimator and a maximum- 
likelihood estimator (see ij lS.lb . In general, the WLS estimator gave biased values 
for the slope, while the MM estimator for the slope was approximately unbiased 
except in the limit of extreme measurement error, and the maximum-likelihood 
estimator was approximately unbiased except in the limit of a small sample with 
extreme measurement error. Therefore, based on the problems associated with the 
WLS estimator based on Equation (T3[ . I do not recommend its use. 

While the modification to the least squares loss function by Equation ( fTSl l ex- 
hibits some problems, it is still possible to derive consistent estimators for the re- 
gression parameters by modifying the least squares loss function. Instead, consider 
the following modified loss function: 

e(a, ^ , a^) = -i f [{yi - « - -^^al] . (14) 

" 1=1 

Equation ( fT4] i corrects the usual least-squares loss function by subtracting off the 
contribution to the squared error from the measurement errors, and is therefore an 
estimate of the loss function that would have been obtained if there was no measure- 
ment error Minimization of Equation (fT4l) with respect to {a,P,a^) results in the 
MM estimators given by Equations (l9]l- (fTTl i (Cheng & Van Ness 1999). Therefore, 
the method of moments estimators can be understood as resulting from a corrected 
least squares loss function. 

Thus far I have focused on linear regression. However, there are cases where a 
non-linear relationship may exist between the average value of the response and the 
covariate, and one desires to use a functional model. Patriota & Bolfarine (2008) de- 
scribe a corrected score method for polynomial regression under the heteroskedastic 
additive error model (§|2]i, which they applied to an astronomical data set. The reader 
is referred to their work for further details. 
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5 Structural Methods for Regression and Density Estimation 



Structural models for regression are those that make assumptions about the dis- 
tribution of the covariate. As such, they are also applicable to density estima- 
tion. I will focus on structural models that rely on the construction of a likelihood 
functioifl therefore requiring one to specify a parameteric model for the distribu- 
tions of the measurement errors, intrinsic scatter, and covariates. These methods 
include both maximum-likelihood estimators and Bayesian methods. Likelihood- 
based techniques have the advantage that they are flexible and may be applied to 
a variety of problems, including those requiring non-linear forms for /(^ , 6), vari- 
ance in intrinsic scatter that depends on the covariate, and data sets that include 
censoring and truncation. However, they have the disadvantages that they are com- 
putationaly expensive, and that one must assume a parameteric form for all dis- 
tributions involved, decreasing their robustness. That being said, it is possible to 
use highly flexible parameteric forms, increasing the robustness of likelihood based 
methods (Huang et al. 2006). Moreover, the additional assumptions involved in the 
parameteric modeling typically buys one an increase in efficiency, providing smaller 
standard errors for the maximum-likelihood and Bayesian estimators when the pa- 
rameteric statistical model is a good description of the data. 



5.7 Constructing the Likelihood Function 

The basic idea behind likelihood-based methods is to treat the measurement errors 
as a missing data problem. Little & Rubin (2002) describe methods for handling 
missing data, while Gelman et al. (2004) describe Bayesian approaches to the miss- 
ing data problem. First, one formulates the likelihood function for the complete 
data, i.e., the likelihood function for both the measured and true values of the data. 
In general, for regression we have the foUowing hierarchical model: 

^i-pi^lw) (15) 
T7,|^,-/7(tj|^,0) (16) 
y,-,x,|Tji,^; -/7(3;,x|tj,^). (17) 

The notation z ^ piz) means that the random variable z is drawn from the probabil- 
ity distribution p(z)- The distributions p{^\\i/),p{ri\£,,6), and p{y,x\ri,^) are the 
distributions for the covariates, the response given the covariate, and the measured 
data, respectively. The distribution for the covariate is parameterized by xj/, while 
the distribution for Tj at a given ^ is parameterized by 0; note that here I have ab- 
sorbed the parameter describing the variance in the intrinsic scatter into 9, whereas 



The likelihood function is the probability of observing the data, given some parameters. It re- 
quires assuming a parameteric form for the sampling distribution of the data. 
^ Data are said to be censored when only an upper or lower limit is available. 
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in the previous sections I have kept <t^ seperate from 0. For simpHcity, I assume that 
the distribution of the measurement errors is considered known, as is typically the 
case in astronomy. If additional parameters are needed to describe the distribution of 
the measured data, e.g., if the variance in the measurement errors is unknown, then 
these should be included in Equation (fTTT i. Most of the interest in regression lies in 
inference on 9, which describes how the response depends on the covariates. If, in- 
stead of regression we are interested in density estimation, then there is no response 
variable and only Equations ( fTSI l and (TU are used. 

Under the statistical model given by Equations (fT5ll-(fT7]i. the complete data like- 
lihood function for the ;* data point is 

p{yi,'!ii,'ni,^i\0,\i/)^ p{yi,Xi\rii,^i)p{rii\^i,e)p{^i\\l/). (18) 

In order to calculate the observed data likelihood function for the data point, we 
integrate out the missing (and thus unknown) data from the complete data likelihood 
function: 

p{yt,x,\e,w)^ I lp{yi,Xi\ri,^)p{ri\^,e)p{^\xif)dr]d^ (19) 

When the data points are statistically independent, as is almost always the case, 
the observed data likelihood function for the entire data set is the product of Equa- 
tion (T% over the ; = data points. Further details on this procedure can 
be found in Carroll et al. (2006). Once one has chosen parameteric forms for the 
distributions involved in Equations (fT5]l-(fT7]i. one can use Equation ( fT9] l to com- 
pute the maximum-likelihood estimate for the parameters {OjXj/) and use the like- 
lihood ratio to estimate confidence regions for the parameters. That's it! Of course, 
in practice this is not so simple, as computing the integrations involved in Equation 
( fT9l ) and performing the optimization of Equation iT% can be numerically difficult. 
The Expectation-Maximization (EM) algorithm is often helpful, and additional nu- 
merical techniques are described in, for example. Press et al. (2007) and Robert & 
Casella (2004). 

As an example of the likelihood approach, consider the following simple model. 
Assume the measurement errors to be normally distribution with zero mean and 
known variances, as described in § |2] For the regression model, assume that the 
response (tj) at fixed covariate {£,) is normally distributed with mean f{£,,0) = 
a + and variance (7^; this is the usual linear regression model with Gaussian 
intrinsic scatter. The distribution of the covariates is assumed to be a p-dimensional 
multivariate normal density with mean fi and covariance matrix T. Under this 
model, the parameters are = (a,j3,a^) and Xj/ = For this model, the inte- 

grals in Equation ( fT9] l can be done analytically. Denoting z, = (y/jX,), the measured 
data hkelihood is 

C = (a + j3^Ai,M) (21) 
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The Gaussian likelihood model described here is commonly used, but it is not 
robust and can be subject to considerable systematic error due to model mispecifi- 
cation (e.g., Huang et al. 2006). Motivated by this, several authors have proposed 
using a mixture of Gaussian functions as a model for the distribution of the covari- 
ates (e.g., Carroll et al. 1999, Roy & Banerjee 2006, Kelly 2007). Bovy et al. (2009) 
describe a mixture of Gaussian functions model for density estimation when some 
of the measurements are missing at random. Kelly et al. (2008) describe a mixture 
of Gaussian functions model for density estimation of a truncated sample, with em- 
phasis on luminosity function estimation. The mixture of Gaussian functions model 
inherits much of the mathematical simplicity of the Gaussian model, enabling an an- 
alytic calculation of the observed data likelihood, while still being flexible enough 
to model most realistic astrophysical distributions. In addition, Andreon (2006) de- 
scribe a model for incorporating contamination from a background distribution, and 
model the distribution of the covariates as a mixture of Schechter function^ 



5.2 Bayesian Methods and an Example 

Bayesian methods build on the likelihood methods described in 15. H and compute 
the probability distribution of the parameters, given the observed data; this is called 
the 'posterior' distribution. This is done by first assuming a 'prior' distribution on 
the parameters, \//), where the prior distribution quantifies our information on 
the parameters Q and before we take any of the data. The posterior distribution is 
then related to the prior and the likelihood by 

p(0,V/|y,X) = p(0,r)p(y,X|0,V)- (23) 

For example, for the Gaussian model described by Equations (l20 l i- (l22l) . and as- 
suming a uniform prior on the parameters (/?(«, j3,a^,/i,r) ^ O, the posterior 
distribution for {a,P,<7^,lX,T) is proportional to Equation (l20l i as a function of 
these parameters. Bayesian methods differ from the frequentist likelihood methods, 
such as maximum-likelihood, in that the inclusion of the prior distribution enables 
one to calculate the probability of the parameters, given the observed data. This im- 
plies that, in theory, the posterior distribution is exact, and therefore uncertainties 
on the parameters are reliable and easy to interpret regardless of the sample size and 
complexity of the statistical model. In contrast, the maximum-likelihood methods 
compute a point estimate of the parameters, and then use various methods (e.g., the 
likelihood ratio or bootstrap) to estimate the sampling distribution of the parame- 



' The Schechter function is an unnormalized Gamma distribution. It is commonly used in astron- 
omy as a model for the number density of galaxies in the universe as a function of their luminosity. 
* Technically this is uniform subject to the conditions that > and |r| > 0. 
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ters, from which confidence regions are derived. The maximum-likeUhood methods 
are useful, but it can become difficult to estimate the sampling distribution when the 
sample size is small, or for highly complex and difficult models. 

Bayesian methods have become increasingly popular in astronomy, as well as in 
other scientific disciplines. The primary driver of this increase in popularity has been 
the advancements in statistical computing that have enabled Bayesian inference, 
namely the use of Markov Chain Monte Carlo (MCMC) methods. Details of MCMC 
methods may be found in Gelman et al. (2004) and Liu (2004), and for an example 
of an MCMC algorithm under linear regression and heteroskedastic measurement 
errors, see Kelly (2007). One of the primary advantages of MCMC methods is that 
they are modular, and we can divide the computational problem up into smaller com- 
putational problems that are easier to solve. Because the true values of the data are 
not known, they are treated as additional parameters, and thus can also be updated 
via MCMC. We can also incorporate upper and lower limits in a straightforward 
manner through this approach by treating their true values as missing data (Kelly 
2007), although the definition of upper limit in astronomy is not always straightfor- 
ward (Kashyap et al. 2010). These properties of MCMC samplers are a significant 
advantage of the Bayesian approach, as we avoid the integration over the true values 
of the data required in Equation (l9[ for the maximum-likelihood approach, and we 
obtain improved estimates for the true values of the data. In fact, often it is easier 
to program a MCMC sampler and perform Bayesian inference than it is to do the 
optimization and numerical integration required for maximum-likelihood. 

As an illustration of the Bayesian approach, I consider a data set from Constantin 
et al. (2011, in prep) comparing the X-ray photon index. Fx, with the luminosity 
relative to the Eddington limit (i.e., the Eddington ratio, L/LecIci) for a sample of 
Active Galactic Nuclei (AGNfl The measured data are shown in Figure 2a. The X- 
ray photon index provides a measure of how much energy is being released through 
soft X-rays as opposed to hard X-rays, and the Eddington Luminosity is the lumi- 
nosity at which outward radiation and inward gravitational pressure balance for a 
spherical geometry. This data set provides a good illustration of the power of the 
Bayesian approach, as the average value of the response exhibits a non-linear and 
non-monotonic dependence on the covariate, and the measurement errors are very 
large in both the response and covariate. The values of the Eddington ratio (i.e., the 
covariate) where the X-ray photon index (i.e., the response) changes its dependence 
on L/LecIcI are of particular interest, as models of black hole accretion flows suggest 
that the accretion flow geometry changes at certain critical values of the Edding- 
ton ratio. Because of this, and the non-linear appearance in the data, I have chosen 
to model the data using a segmented line with two knots, where the slope of the 
line changes at the knots. I modeled the intrinsic distribution of \ogLx / LEdd a 
mixture of three Gaussian distributions. To make the model robust against outliers, 
I assume that the both measurement errors and the intrinsic scatter follow a Stu- 
dent's t-distribution with eight and four degrees of freedom, respectively. I used the 
MCMC algorithms described in Chapter 9 of Carroll et al. (2006) and Kelly (2007) 

^ AGN are believed to be supemiassive black holes that are accreting gas and are located in the 
center of a galaxy. 
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Fig. 2 The measured values of Fx and / LeiUI compared with the region containing 68% of 

the posterior probability for the mean value of Fx at fixed Lx /L^jj (left). The data point with error 
bars is not real and only used to illustrate the typical size of the error bars. Also shown are the 
posterior mean values for the true values of Fx and logLx/^£f/rf> compared with the best-fitting 
segmented line (right). A non-linear trend is apparent in both the segmented line model and in the 
estimated distribution of Fx and logLx /Lfi^rf using the segmented line as a prior. 



as the basis for my MCMC sampler under this model, and include an ancillarity- 
sufficiency interweaving strategy for increased efficiency (Yu & Meng 2011). This 
MCMC algorithm produces both random draws of the parameters for the segmented 
line model from their posterior distribution, but also random draws of the true values 
of the Eddington ratio and photon index from their posterior distribution. 

The region containing 68% of the posterior probability on the mean value of Fx 
as a function of Lx/Le^^i is also shown in Figure 2a. The location of the knots are 
estimated to be logLx/LEdd = —6.65 ±0.25 and —3.91 ±0.21, respectively. The 
segmented line model of Fx at fixed Lx/LecUI is preferred over a simple line model, 
illustrating the complex dependence of Fx on Lx/LecUI- In Figure 2b I show the 
posterior mean values of Fx and logLx /Lecui, as well as the segmented line com- 
puted from the posterior mean for its parameters. The posterior mean estimates for 
the true (i.e., not measured) values of Fx and \ogLx / LecUI represent a more model- 
independent estimate of the dependence of the photon index on Lx jl^Edd- This rep- 
resents a real advantage of the Bayesian approach, as not only are we able to estimate 
the probability distribution of the parameters of interest, but we can also estimate 
the probability distribution of the true values of the data as well, conditional on our 
assumed statistical model, the measured values of the data, and the amplitude of the 
measurement errors. The non-linear trend is also apparent from the values of Fx and 
Lx/LEdd estimated from the Bayesian method. The knot at Lx/LEdd ~ 2 x 10^^ 
may represent the increasing prevalence of additional astrophysical components to 
the X-ray spectrum as the AGN becomes faintier, such as hot gas not associated 
with the AGN, while the knot at Lx/LEdd 10^^ may represent a change in the 
accretion flow geometry. Figure 2b suggest that the scatter in Fx at fixed Lx/LEdd 
increases near the knot at Lx/LEdd ~ lO^'*, which may be indicative of instabilities 
when the accretion flow changes geometry, or of uncorrected intrinsic absorption. 
Further analysis of this data set will be discussed in Constantin et al. (2011, in prep). 
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6 Outstanding Issues in Measurement Error Models for 
Astronomical Data: Directions for Future Research 

I will conclude by listing a couple of unsolved problems in dealing with measure- 
ment errors in astronomical data analysis, which I hope will lead to further research 
in this area. 

• Data Subject to Large, Non-Gaussian Measurement Errors. Non-gaussian 
errors are common in astronomical data, especially when one is analyzing a set 
of derived quantities. Often, the most physically-interesting quantities are those 
derived by fitting an astrophysical model to the measured flux values at various 
wavelengths. Often the unertainties in these derived quantities are large, skewed, 
or exhibit multiple modes. There is currently no well-established method for han- 
dling the measurement errors in this case, although Bayesian hierarchical models 
such as that proposed by van Dyk et al. (2009) hold promise. 

• Handling Measurement Errors in Massive Astronomical Data Sets. Current 
and planned astronomical surveys will provide an explosion of data, allowing 
one to construct data sets with millions to billions of objects, each with multi- 
ple quantities measured. Many powerful methods developed for data mining will 
be applied to these data, potentially providing a powerful route to knowledge 
discovery. Unfortunately, all of the quantities obtained from these data sets will 
be measured with error, and most methods developed for data mining of mas- 
sive data sets do not incorporate measurement error. This is especially a problem 
when dealing with derived quantities, which will likely require a more careful 
statistical analysis on account of their sometimes highly irregular error distribu- 
tions. Currently, algorithms, such as MCMC, that allow one to perform reliable 
statistical inference on complicated statistical models do not scale well to mas- 
sive data sets. If we want to perform inference on massive data sets subject to 
measurement error using more complicated and realistic statistical models, we 
will need advances on the computational side. 
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pubUcation, and Aneta Siemiginowska, Xiaohui Fan, and Tommaso Treu for help- 
ful comments on an earlier version of this manuscript. I acknowledges support by 
NASA through Hubble FeUowship grant #HF-5 1243.01 awarded by the Space Tele- 
scope Science Institute, which is operated by the Association of Universities for 
Research in Astronomy, Inc., for NASA, under contract NAS 5-26555. 
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